Data Science Startups funded by Y Combinator (YC) 2026

May 2026

Browse 24 of the top Data Science startups funded by Y Combinator.

We also have a Startup Directory where you can search through over 5,000 companies.

  • Lotas
    Lotas
    Y Combinator LogoS2025
    Active • 2 employees • San Francisco, CA, USA
    For our first product, we built Rao: an AI coding assistant into RStudio — an IDE used by up to 5 million data scientists and statisticians who use the R programming language to analyze data. Rao could read, write, and edit code while understanding the user's context (codebase and environmental variables).
    data-science
    developer-tools
    data-visualization
    artificial-intelligence
  • Fleetline
    Fleetline
    Y Combinator LogoS2025
    Active • 7 employees • New York, NY, USA
    Fleetline is building the first complete-context algorithmic load planner for mid to large-sized trucking fleets. Today’s dispatchers face an overwhelming task: balancing regulations, customer demands, live fleet data, and individual driver needs, all while coordinating across siloed tools and teams. Mistakes run rampant, and each mistake is thousands of dollars. Even the select fleets using algorithms that have fleet-wide context struggle: the outputs are rigid, hard to interpret, and are blind to real-world nuances like driver preferences and schedule exceptions. Fleetline solves this by combining advanced optimization with LLMs that can easily adapt algorithms to each fleet’s needs and capture driver-specific information. The result is smarter planning truly optimized for every fleet and its drivers.
    data-science
    artificial-intelligence
    logistics
    supply-chain
  • Percival
    Percival
    Y Combinator LogoP2025
    Active • 3 employees • San Francisco, CA, USA
    Building an AI copilot for researchers to analyze and transform real-world data.
    artificial-intelligence
    data-science
    developer-tools
  • Plexe
    Plexe
    Y Combinator LogoP2025
    Active • 2 employees • London
    Plexe builds predictive ML models from a problem description. It connects to data sources, conducts experiments, evaluates and deploys the models to an API endpoint.
    ai
    machine-learning
    data-science
  • Klavis AI
    Klavis AI
    Y Combinator LogoP2025
    Active • 3 employees
    Powering frontier AI labs with real world MCP environments and complex, long-horizon agentic tool-use data.
    reinforcement-learning
    data-science
    ai
    artificial-intelligence
  • Sherpa Labs
    Sherpa Labs
    Y Combinator LogoW2025
    Active • 2 employees • New York City
    Sherpa Labs is building an agentic data team that automates modeling, operations, and discovery via a swarm of agents. Their initial product is a data catalog that enables developers and AI agents to quickly locate the right data sources, effortlessly trace data lineage, and understand messy data lakes — similar to Glean but for enterprise data systems.
    data-engineering
    data-science
    b2b
    saas
    ai
  • VortexifyAI
    VortexifyAI
    Y Combinator LogoF2024
    Active • 3 employees • New York, NY, USA
    Vortexify AI is a platform for building fully operational AI workflows tailored to supply chain operations. Deploy specialized, task-specific AI bots in days — not weeks. Our platform streamlines the creation of AI bots that can analyze millions of data rows, manage complex, long-horizon processes, and collaborate seamlessly with humans in the loop. Each AI bot comes equipped with custom tools maintained through AI-powered code editors directly within the Vortexify platform. Development teams can instantly generate dashboards, data pipelines, machine learning models, and custom functions — all contextualized to business goals and data requirements. Bots can operate in Co-pilot or Agent (autonomous) mode. They can be scheduled or triggered by real-time alerts and are governed by robust, natural language-generated guardrail templates that ensure safety, compliance, and reliability.
    iot
    data-science
    ai-assistant
    supply-chain
    ai
  • Zoa Research
    Zoa Research
    Y Combinator LogoS2024
    Active • 5 employees • New York, NY, USA
    Historically, quantitative models are domain specific. Brilliant people spend their best years testing features, tuning hyperparameters, and iterating architectures within a narrow domain. But scale is the panacea: large models will find patterns people, and specialized models, could not. Forecasting generalizes. Zoa trains cross-domain event forecasting engines. *Automating Iteration* LLMs—embedded in multi-agent optimization loops and evaluated against fixed policies—can automate the build-test-improve modeling cycle. Think AlphaEvolve for forecasting problems. *Sample-Efficient General Models* Today’s forecasting models are narrowly crafted with deep human priors. But larger models will outperform state-of-the-art specialized models. Unlike existing event models, our models leverage data from across contexts and rely less on human intuition. And compared to LLMs, our models are built with more inductive priors and rely more heavily on inference-time compute—improving sample efficiency. *Why It Matters* In the real economy, our models could be useful for forecasting supply chain volatility, energy supply and demand, even earthquake risk. Science is, Ian Hacking writes, the taming of chance. It is the process of iteratively updating priors (something like: identify uncertainty, conceive experiment to reduce uncertainty, execute, update). If science is uncertainty-reduction, forecasting is a critical measure of progress. Better forecasting improves our ability to select interesting experiments (roughly those with greatest expected uncertainty reduction) and update priors. Our models will be used by labs and academics in data-heavy domains. Sam's ex-girlfriend introduced him to Greg back at Carnegie Mellon in 2017, and while that relationship didn't last, their friendship has. After college, Greg went to Harvard Law School, while Sam worked for three years at Jane Street on their Options desk, building & leading a satellite dev team.
    ai
    data-science
  • Overstand Labs
    Overstand Labs
    Y Combinator LogoW2025
    Active • 4 employees • New York City
    Overstand is a data lab that allows our customers to navigate any set of data in just a few minutes. *Enterprise*: For enterprises, we we unify Slack, email, calls, and operational data, then surface the signals that matter — customer needs, risks, and revenue opportunities hidden in everyday conversations. *Legal Firms*: For legal firms, we help them really quickly understand their entire discovery corpus (either before, or after document review), and quickly build out an initial case assessment and facts. Instead of waiting on reports or manual analysis, teams get immediate, evidence-backed answers from the data they already have. Overstand delivers clarity and leverage as your business scales.
    data-science
    data-engineering
    conversational-ai
    artificial-intelligence
    legaltech
  • Thunder Compute
    Thunder Compute
    Y Combinator LogoS2024
    Active • 4 employees • San Francisco, CA, USA
    One-click GPU instances with persistent storage, snapshots, and hot-swappable hardware, with the lowest prices anywhere.
    infrastructure
    cloud-computing
    developer-tools
    data-science
    artificial-intelligence
  • MinusX
    MinusX
    Y Combinator LogoS2024
    Active • 2 employees • San Francisco, CA, USA
    MinusX is a chrome extension that adds a side chat to your analytics apps (Jupyter, Metabase, Grafana, Tableau, etc). Given an instruction, our agent operates your apps - by clicking & typing, just like you do - to analyze data and answer queries. We believe an AI Data Scientist is a scientist, not yet-another-new-analytics-platform. MinusX interoperates with you in tools you already love and use, and as a matter of philosophy, gets out of the way.
    ai-assistant
    analytics
    data-science
    machine-learning
    ai
  • Mica AI
    Mica AI
    Y Combinator LogoS2024
    Active • 3 employees • San Francisco, CA, USA
    Mica's AI agents replace the data ops teams fixing bad data. When bad or missing data breaks the pipeline, and orchestration, retries, and monitoring fail, painful manual review work kicks in, pulling humans in to investigate and patch data issues across systems. Mica does what those humans do: gathering the right information from internal docs and external systems, reasoning across context, and resolving errors autonomously to get the pipeline moving again. The result: dramatically reduce time, cost, and operational drag as your data pipelines scale without scaling ops headcount. Mica turns judgment-heavy data fixes from a manual bottleneck into an automated background process with full auditability.
    data-engineering
    data-science
    enterprise-software
  • Metofico
    Metofico
    Y Combinator LogoW2024
    Active • 2 employees • London, UK
    Metofico provides a no-code data analysis tool tailored for the life sciences. Our platform enables life scientists to analyse complex/massive datasets and extract necessary insights without needing advanced programming skills. This accessibility helps both researchers new to data science and experts save months of work. Metofico aims to be the leading centralized platform for data analysis in life science research, covering a wide range of applications from brain activity analysis (like photometry and EEG) to AI-powered detection and tracking of research animals. Our vision is to accelerate research processes and enhance the quality of research outputs across the board. By streamlining complex data analysis and making it more accessible, we’re committed to driving forward scientific discoveries and innovation.
    saas
    no-code
    data-science
    data-visualization
  • Preloop
    Preloop
    Y Combinator LogoW2024
    Active • 2 employees • Seattle
    Only 2 out of 10 ML models make it from experiment to production. Preloop helps automate the process of deployment, helping companies realize more value from their machine learning teams, while focusing teams' attention on science instead of engineering.
    artificial-intelligence
    developer-tools
    deep-learning
    machine-learning
    data-science
  • camelAI
    camelAI
    Y Combinator LogoW2024
    Active • 3 employees • San Francisco, CA, USA
    For decades, companies have settled for software built for everyone, which means it's perfect for no one. They stitch together five SaaS tools to approximate one workflow, pay for seats they don't use, and file tickets with a data team just to answer a simple question. CamelAI is a different bet: your own AI software engineer, living on its own computer, building exactly what your business needs. You describe what you want. CamelAI builds it, deploys it to a live URL, and keeps it running. No developers required, no infrastructure to manage. Next week when your process changes, you ask again. The software changes with you. This is personal software: tools made for your team, your data, your workflows.
    ai
    data-visualization
    data-science
    saas
  • Cognitio Labs
    Cognitio Labs
    Y Combinator LogoS2023
    Active • San Francisco, CA, USA
    Cognitio Labs is an applied AI research lab building real-time compliance infrastructure for regulated supply chains, serving as the first line of defense against recalls and regulatory failure. When a food contamination event happens, time is everything. Today, traceability relies on spreadsheets, PDFs, and manual logs, turning recalls into multi-day investigations. Entire product categories get destroyed, brands lose trust, and a single recall can cost $10M to $100M+. Why now: The FDA’s FSMA 204 rule requires companies to produce standardized digital traceability records within 24 hours by 2028, impacting over 60,000 U.S. food facilities. The regulatory bar is rising, but the infrastructure to meet it does not exist. Our first product line uses sensors and AI to automatically capture key events such as temperature, handling, and lot-level movements across production, storage, and transit. We convert fragmented operational data into standardized, FSMA 204-compliant traceability records in real time. Compliance is generated as operations happen, not after. This becomes the first line of defense by enabling faster recalls, reducing spoilage, eliminating manual compliance work, and protecting contracts, insurance, and brand equity. We are starting with food and expanding into other regulated, high-risk supply chains.
    ai-assistant
    aiops
    b2b
    data-science
    artificial-intelligence
  • Sohar Health
    Sohar Health
    Y Combinator LogoS2023
    Active • 8 employees • New York, NY, USA
    Sohar Health is an AI-driven front-end RCM solution that transforms insurance verification processes for healthcare providers, enabling faster patient conversions and reducing administrative workloads. With a 95% automation rate, our API-first platform seamlessly integrates into existing workflows to provide real-time claim accuracy and eligibility checks. Key performance metrics showcase the power of our technology: Median latency of just 6 seconds, ensuring real-time eligibility; over 90% of checks returned within 30 seconds for increased patient conversion; 96% accuracy in identifying and verifying benefits details; Industry-leading 99% accuracy for eligibility determination; >90% carve-out detection rate, mitigating surprise billing risks; a 60% success rate with our Insurance Discovery API, helping identify coverage for self-pay patients. Our customers include outpatient clinics, specialty practices, and digital health platforms looking to streamline front-end claim management. By reducing errors and maximizing clean claim submissions, Sohar Health empowers healthcare organizations to convert and retain more patients, reduce operating costs, and increase their top-line revenue.
    artificial-intelligence
    digital-health
    api
    health-tech
    data-science
  • Mito
    Mito
    Y Combinator LogoS2020
    Active • 3 employees • New York, NY, USA
    Mito is Cursor for data science. We’re building an AI enabled IDE to 10x the productivity of data people. Data analysts use Mito to automate reports without relying on internal engineering resources. Mito is used by thousands of business analysts, data scientists, and automation engineers at some of the world's largest banks, private equity shops, and consulting firms. Open source is key to our enterprise sales strategy. Mito is open source and built on top of Jupyter. That means getting started with Mito is as simple as running `pip install` and doesn't require enterprises to manager new infrastructure. Check us out at: https://www.trymito.io/
    analytics
    open-source
    developer-tools
    data-science
    ai
  • Cellbyte
    Cellbyte
    Y Combinator LogoW2022
    Active • 3 employees • Munich, Germany
    Cellbyte's AI agents help pharma companies launch new drugs worldwide. Market-leading firms are using Cellbyte to answer questions like “What price can we achieve for our new drug in the U.S. versus Germany?“. The three co-founders have known and worked with each other for many years: Daniel brings 5+ years of industry experience from his previous job at leading global Life Sciences consultancy Simon-Kucher. Felix has sold $3m+ ACV deals to customers like H&M for his previous YC startup. Samuel holds an MSc in Information Systems from TUM, and has built enterprise AI applications from scratch as ML Engineer at Celonis.
    healthcare-it
    data-science
    ai
  • Basedash
    Basedash
    Y Combinator LogoS2020
    Active • 6 employees • Montreal, QC, Canada
    Basedash is the AI-native Business Intelligence platform. Create dashboards and instantly understand your customers using natural language. Connect 500+ data sources, ask a question, and let Basedash visualize the answer.
    b2b
    saas
    data-visualization
    data-science
    ai
  • Centaur
    Centaur
    Y Combinator LogoW2019
    Active • 45 employees • Boston, MA, USA
    The best AI models aren’t just trained and evaluated with human data; they’re built with superhuman data. The strongest datasets emerge through collective intelligence, where humans and machines work together to outperform either one alone. At Centaur, we create superior quality data by turning annotation into an arena where experts and AI compete.
    data-labeling
    crowdsourcing
    data-science
    artificial-intelligence
  • Dost Education
    Dost Education
    Y Combinator LogoW2017
    Active • 30 employees • Delhi, India
    Dost is an ed-tech nonprofit building a platform to expand access to early childhood development in low-resource settings through parent education. Our mission is to unlock children’s full potential by focusing on early learning - the time when 90% of our brains develop. We believe that parents of any literacy level can play a critical role in developing their children’s school and life readiness. In India alone, there are 150 million under-resourced caregivers who can benefit from resources developed for them. That’s why our team at Dost Education - educators, entrepreneurs, and engineers - are passionate about using technology and user-centric product design to change the trajectory of families’ lives. Since 2017, we’ve grown from a small pilot with a few hundred mothers, to reaching over 100,000 families working with state governments. Join us as we continue to innovate, grow our reach, and deepen our impact. We are supported by some of the best funders in tech and social impact space like Y Combinator, Mulago, and many others. Dost Education values diversity. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
    education
    india
    nonprofit
    data-science
  • Nova Credit
    Nova Credit
    Y Combinator LogoS2016
    Active • 100 employees • New York City
    Nova Credit is a credit infrastructure and analytics company that enables businesses to grow responsibly through alternative credit data. As a Consumer Reporting Agency (CRA), Nova Credit leverages its unique data infrastructure, compliance framework, and credit expertise to help lenders fill critical gaps in traditional credit analytics. The company transforms the fragmented universe of consumer financial data into compliant, actionable risk insights through a comprehensive platform designed to increase conversion through expanded coverage, speed, and reliability. Leading organizations, including HSBC, RBC, SoFi, Scotiabank, Appfolio, and Yardi, work with Nova Credit to make smarter credit decisions through cash flow underwriting with Cash Atlas™, quickly verify income with Income Navigator, and reach new-to-country consumers with Credit Passport®. Nova Credit is backed by investors including Kleiner Perkins, General Catalyst, Index Ventures, and Canapi as well as executives from Goldman Sachs, JPMorgan, and Citi. Learn more at www.novacredit.com or reach out to connect@novacredit.com.
    fintech
    data-science
  • Pachyderm
    Pachyderm
    Y Combinator LogoW2015
    Acquired • 60 employees • San Francisco, CA, USA
    Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you.
    machine-learning
    data-science
    developer-tools